AITopics | architectural parameter

Collaborating Authors

architectural parameter

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Preliminaries on Transformers

Neural Information Processing SystemsAug-17-2025, 04:55:46 GMT

Perplexity is a widely used metric for evaluating the performance of autoregressive language models. This metric encapsulates how well the model can predict a word.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Learning Novel Transformer Architecture for Time-series Forecasting

Zhang, Juyuan, Zhu, Wei, Gao, Jiechao

arXiv.org Artificial IntelligenceFeb-19-2025

Despite the success of Transformer-based models in the time-series prediction (TSP) tasks, the existing Transformer architecture still face limitations and the literature lacks comprehensive explorations into alternative architectures. To address these challenges, we propose AutoFormer-TS, a novel framework that leverages a comprehensive search space for Transformer architectures tailored to TSP tasks. Our framework introduces a differentiable neural architecture search (DNAS) method, AB-DARTS, which improves upon existing DNAS approaches by enhancing the identification of optimal operations within the architecture. AutoFormer-TS systematically explores alternative attention mechanisms, activation functions, and encoding operations, moving beyond the traditional Transformer design. Extensive experiments demonstrate that AutoFormer-TS consistently outperforms state-of-the-art baselines across various TSP benchmarks, achieving superior forecasting accuracy while maintaining reasonable training efficiency.

architecture, opération, search space, (12 more...)

arXiv.org Artificial Intelligence

2502.13721

Country:

North America > United States > Virginia (0.04)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
Asia > China > Hong Kong (0.04)
(5 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The devil is in discretization discrepancy. Robustifying Differentiable NAS with Single-Stage Searching Protocol

Subbotko, Konstanty, Jablonski, Wojciech, Bilinski, Piotr

arXiv.org Artificial IntelligenceMay-26-2024

Neural Architecture Search (NAS) has been widely adopted to design neural networks for various computer vision tasks. One of its most promising subdomains is differentiable NAS (DNAS), where the optimal architecture is found in a differentiable manner. However, gradient-based methods suffer from the discretization error, which can severely damage the process of obtaining the final architecture. In our work, we first study the risk of discretization error and show how it affects an unregularized supernet. Then, we present that penalizing high entropy, a common technique of architecture regularization, can hinder the supernet's performance. Therefore, to robustify the DNAS framework, we introduce a novel single-stage searching protocol, which is not reliant on decoding a continuous architecture. Our results demonstrate that this approach outperforms other DNAS methods by achieving 75.3% in the searching stage on the Cityscapes validation dataset and attains performance 1.1% higher than the optimal network of DCNAS on the non-dense search space comprising short connections. The entire training process takes only 5.5 GPU days due to the weight reuse, and yields a computationally efficient architecture. Additionally, we propose a new dataset split procedure, which substantially improves results and prevents architecture degeneration in DARTS.

architecture search, search space, supernet, (17 more...)

arXiv.org Artificial Intelligence

2405.1661

Country: Europe > Poland > Masovia Province > Warsaw (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.59)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback

Weight-Entanglement Meets Gradient-Based Neural Architecture Search

Sukthanker, Rhea Sanjay, Krishnakumar, Arjun, Safari, Mahmoud, Hutter, Frank

arXiv.org Artificial IntelligenceDec-16-2023

Weight sharing is a fundamental concept in neural architecture search (NAS), enabling gradient-based methods to explore cell-based architecture spaces significantly faster than traditional blackbox approaches. In parallel, weight entanglement has emerged as a technique for intricate parameter sharing among architectures within macro-level search spaces. Since weight-entanglement poses compatibility challenges for gradient-based NAS methods, these two paradigms have largely developed independently in parallel sub-communities. This paper aims to bridge the gap between these sub-communities by proposing a novel scheme to adapt gradient-based methods for weight-entangled spaces. This enables us to conduct an in-depth comparative assessment and analysis of the performance of gradient-based NAS in weight-entangled search spaces. Our findings reveal that this integration of weight-entanglement and gradient-based NAS brings forth the various benefits of gradient-based methods (enhanced performance, improved supernet training properties and superior any-time performance), while preserving the memory efficiency of weight-entangled spaces. The code for our work is openly accessible here. The concept of weight-sharing in Neural Architecture Search (NAS) arose from the need to improve the efficiency of conventional blackbox NAS algorithms, which demand significant computational resources to evaluate individual architectures. Here, weight-sharing (WS) refers to the paradigm by which we represent the search space with a single large supernet, also known as the one-shot model, that subsumes all the candidate architectures in that space. Every edge of this supernet holds all the possible operations that can be assigned to that edge. Gradient-based NAS algorithms (or optimizers), such as DARTS (Liu et al., 2019), GDAS (Dong and Yang, 2019) and DrNAS (Chen et al., 2021b), assign an architectural parameter to every choice of operation on a given edge of the supernet.

architecture, international conference, search space, (16 more...)

arXiv.org Artificial Intelligence

2312.1044

Country:

Europe > Germany > Baden-Württemberg > Freiburg (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

SSS3D: Fast Neural Architecture Search For Efficient Three-Dimensional Semantic Segmentation

Therrien, Olivier, Amein, Marihan, Xiong, Zhuoran, Gross, Warren J., Meyer, Brett H.

arXiv.org Artificial IntelligenceApr-21-2023

We present SSS3D, a fast multi-objective NAS framework designed to find computationally efficient 3D semantic scene segmentation networks. It uses RandLA-Net, an off-the-shelf point-based network, as a super-network to enable weight sharing and reduce search time by 99.67% for single-stage searches. SSS3D has a complex search space composed of sampling and architectural parameters that can form 2.88 * 10^17 possible networks. To further reduce search time, SSS3D splits the complete search space and introduces a two-stage search that finds optimal subnetworks in 54% of the time required by single-stage searches.

artificial intelligence, machine learning, randla-net, (15 more...)

arXiv.org Artificial Intelligence

2304.11207

Country:

North America > United States > California > San Francisco County > San Francisco (0.15)
North America > Canada > Quebec > Montreal (0.14)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.40)

Industry:

Information Technology (0.67)
Transportation > Ground > Road (0.46)
Automobiles & Trucks (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.94)

Add feedback

Systolic-CNN: An OpenCL-defined Scalable Run-time-flexible FPGA Accelerator Architecture for Accelerating Convolutional Neural Network Inference in Cloud/Edge Computing

Dua, Akshay, Li, Yixing, Ren, Fengbo

arXiv.org Artificial IntelligenceDec-5-2020

This paper presents Systolic-CNN, an OpenCL-defined scalable, run-time-flexible FPGA accelerator architecture, optimized for accelerating the inference of various convolutional neural networks (CNNs) in multi-tenancy cloud/edge computing. The existing OpenCL-defined FPGA accelerators for CNN inference are insufficient due to limited flexibility for supporting multiple CNN models at run time and poor scalability resulting in underutilized FPGA resources and limited computational parallelism. Systolic-CNN adopts a highly pipelined and paralleled 1-D systolic array architecture, which efficiently explores both spatial and temporal parallelism for accelerating CNN inference on FPGAs. Systolic-CNN is highly scalable and parameterized, which can be easily adapted by users to achieve up to 100% utilization of the coarse-grained computation resources (i.e., DSP blocks) for a given FPGA. Systolic-CNN is also run-time-flexible in the context of multi-tenancy cloud/edge computing, which can be time-shared to accelerate a variety of CNN models at run time without the need of recompiling the FPGA kernel hardware nor reprogramming the FPGA. The experiment results based on an Intel Arria/Stratix 10 GX FPGA Development board show that the optimized single-precision implementation of Systolic-CNN can achieve an average inference latency of 7ms/2ms, 84ms/33ms, 202ms/73ms, 1615ms/873ms, and 900ms/498ms per image for accelerating AlexNet, ResNet-50, ResNet-152, RetinaNet, and Light-weight RetinaNet, respectively. Codes are available at https://github.com/PSCLab-ASU/Systolic-CNN.

cnn model, computation, systolic-cnn, (13 more...)

arXiv.org Artificial Intelligence

2012.03177

Country: North America > United States > Arizona (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A version of the BERT language model that's 20 times as fast

#artificialintelligenceDec-4-2020, 08:05:55 GMT

In natural-language understanding (NLU), the Transformer-based BERT language model is king. Its high performance on multiple tasks has strongly influenced contemporary NLU research. On the other hand, it is a relatively big and slow model, which makes it unsuitable for some applications. Multiple efforts have been made to compress the BERT architecture, but the choice of architectural parameters (the number of layers, the number of processing nodes per layer, and so on) has been somewhat arbitrary, and the resulting models are rarely much better than the original at optimizing the balance between the model's size, speed, and error rate. A few weeks ago, we released part of the code for Bort, a highly optimized language model (LM) extracted from the BERT architecture through a combination of two rigorous algorithmic techniques especially designed for neural-network compression.

algorithm, architectural parameter, bert, (15 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

Amazon's BERT Optimal Subset: 7.9x Faster & 6.3x Smaller Than BERT

#artificialintelligenceNov-8-2020, 20:05:04 GMT

The transformer-based architectures BERT has in recent years demonstrated the efficacy of large-scale pretrained models for tackling natural language processing (NLP) tasks such as machine translation and question answering. BERT's large size and complex pretraining process however raise usability concerns for many researchers. In a new paper, a pair of Amazon Alexa researchers extract an optimal subset of architectural parameters for the BERT architecture by applying recent breakthroughs in algorithms for neural architecture search. The proposed optimal subset, "Bort," is just 5.5 percent the effective size of the original BERT-large architecture (not counting the embedding layer), and 16 percent of its net size. Many attempts have been made to extract a simpler sub-architecture of BERT that maintains similar performance to its predecessor while simplifying the pretraining process and shortening inference time. Yet the performance of such sub-architectures is still being surpassed by the original implementation in terms of accuracy, the researchers say, and the choice of the set of architectural parameters in these works often appears to be arbitrary.

architectural parameter, architecture, bert, (8 more...)

#artificialintelligence

Country: Asia > China (0.08)

Genre: Research Report > New Finding (0.73)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.57)

Add feedback

This New BERT Is Way Faster & Smaller Than The Original

#artificialintelligenceNov-3-2020, 07:15:32 GMT

Recently, the researchers at Amazon introduced an optimal subset of the popular BERT architecture for neural architecture search. This smaller version of BERT is known as BORT and is able to be pre-trained in 288 GPU hours, which is 1.2% of the time required to pre-train the highest-performing BERT parametric architectural variant, RoBERTa-large. Since its inception, BERT has achieved several groundbreaking tasks in the field of natural language processing (NLP) and natural language understanding (NLU). It has made a resounding impact in the area of language modelling, as well. However, several times, the usability of BERT has been considered an issue for various serious concerns, such as its larger size, slow inference time, complex pre-training process, among others.

artificial intelligence, machine learning, natural language, (15 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.52)

Add feedback

Filters

Collaborating Authors

architectural parameter

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

9949e6906be6448230cdba9a4cb2d564-Supplemental-Conference.pdf

A Preliminaries on Transformers

Learning Novel Transformer Architecture for Time-series Forecasting

The devil is in discretization discrepancy. Robustifying Differentiable NAS with Single-Stage Searching Protocol

Weight-Entanglement Meets Gradient-Based Neural Architecture Search

SSS3D: Fast Neural Architecture Search For Efficient Three-Dimensional Semantic Segmentation

Systolic-CNN: An OpenCL-defined Scalable Run-time-flexible FPGA Accelerator Architecture for Accelerating Convolutional Neural Network Inference in Cloud/Edge Computing

A version of the BERT language model that's 20 times as fast

Amazon's BERT Optimal Subset: 7.9x Faster & 6.3x Smaller Than BERT

This New BERT Is Way Faster & Smaller Than The Original